Refactor to use an array backend Part 1 #104

mratsim · 2020-06-13T15:54:59Z

This is a refactoring to bring use an array backend. The actual bit representation will be 100%-bit compatible with the previous one (i.e. casting is possible). And as Stint integers are as if uint64 were extended to uintXXX it should be compatible with the newly exposed iXXX from LLVM/Clang _ExtInt http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html

Status

The status takes into account passing a whole test suite file.
Very light changes may be needed if the test suite was using internal proc or representations like swapBytes or UintImpl
The test suite is run at both compile-time and run-time.
Tests were only done on littleEndian architectures

Unsigned integers
- Bitops2
- Endianness
- Endians2
  - swapBytes removed
  - toBE / fromBE / toLE / fromLE, deprecated
- Bitwise operations (or, and, not, xor, shl, shr, ...)
  - The shift operations should be tested on bigEndian arch they are much more error prone
    when using an array backend than a recursive backend
  - isEven / isOdd for normal integers removed
- Comparison
- Add / Sub
- Mul / Div
  - Mul is implemented,should be OK (requirement for parsing)
- Exponentiation
- Modular Arithmetic
- IO
  - Most of what is necessary for debuging unsigned is implemented
Signed integers
- Bitops2
- Endianness
- Endians2
- Bitwise operations (or, and, not, xor, shl, shr, ...)
- Comparison
- Add / Sub
- Mul / Div
- Exponentiation
- Modular Arithmetic
- IO

Overview

This refactor will bring significant speed and code size improvements that were tested in Constantine.

It also uses the `{.push raises: [].} pragmas to enforce error handling

Addition / Substraction

The codegen should be optimal in terms of codesize with a single ADD + ADC + ADC + ADC for uint256 on x86
This uses the intrinsics addcarry_u64 and subborrow_u64 for GCC/Clang/MSVC.
With uint128 fallback for ARM/WASM. (TODO, use Clang-only intrinsics __builtin_addc and __builtin_subc)

In particular this solves #87.

Note that it is strongly recommended to use anything but GCC (i.e. Clang, MSVC, ICC) as GCC is the absolute worst compiler for multiprecision arithmetic, see https://gcc.godbolt.org/z/2h768y

#include <stdint.h>
#include <x86intrin.h>

void add256(uint64_t a[4], uint64_t b[4]){
  uint8_t carry = 0;
  for (int i = 0; i < 4; ++i)
    carry = _addcarry_u64(carry, a[i], b[i], &a[i]);
}

GCC

add256:
        movq    (%rsi), %rax
        addq    (%rdi), %rax
        setc    %dl
        movq    %rax, (%rdi)
        movq    8(%rdi), %rax
        addb    $-1, %dl
        adcq    8(%rsi), %rax
        setc    %dl
        movq    %rax, 8(%rdi)
        movq    16(%rdi), %rax
        addb    $-1, %dl
        adcq    16(%rsi), %rax
        setc    %dl
        movq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        addb    $-1, %dl
        adcq    %rax, 24(%rdi)
        ret

Clang

add256:
        movq    (%rsi), %rax
        addq    %rax, (%rdi)
        movq    8(%rsi), %rax
        adcq    %rax, 8(%rdi)
        movq    16(%rsi), %rax
        adcq    %rax, 16(%rdi)
        movq    24(%rsi), %rax
        adcq    %rax, 24(%rdi)
        retq

Multiplication

Multiplication uses the Comba multiplication technique (a.k.a Product Scanning) that reorder the operations to significantly reduces the number of carries at the price of a temporary buffer of size 3 words and convoluted loop scheduling:

nim-stint/stint/private/uint_mul.nim

Lines 28 to 40 in 01ad29e

    
           var t, u, v = Word(0) 
        
           var z: typeof(r) # zero-init, ensure on stack and removes in-place problems 
        
           staticFor i, 0, min(a.len+b.len, r.len): 
        
             const ib = min(b.len-1, i) 
        
             const ia = i - ib 
        
             staticFor j, 0, min(a.len - ia, ib+1): 
        
               mulAcc(t, u, v, a[ia+j], b[ib-j]) 
        
             z[i] = v 
        
             v = u 
        
             u = t 
        
             t = 0

The benefits of comba multiplication is 30% speed improvement due to better register usage.

Also the code allows mul with high bit truncation, extended precision multiplication (mul(256, 256) -> 512) and also multiplication while only keeping the higher bits so that modular arithmetic can use a more efficient Barret Reduction. This should significantly speed up the EVM.

A specific squaring operation will be added to accelerate exponentiation (~20%)

Modular arithmetic

Modular multiplication will be reimplemented to use Barret reduction
Modular exponentiation will use multiplication+Barret reduction for small exponents or Montgomery reduction with window optimization and potential NAF recoding (Non-Adjacent Form) for larger exponents https://github.com/mratsim/constantine/blob/f8fb54fa/constantine/arithmetic/limbs_montgomery.nim#L533-L576

IO

Serialization from hex will be made significantly faster.
Notation: "w" the number of words in the big int, b the number of bytes to read in the hex strings

BigInt multiplication is O(w²) and for each hex bytes read we do a multiplication for O(b * w²) Instead we can:

replace multiplication by a shift for complexity O(b * w)
Do 2 passes to convert hex to bytearray and then bytearray to Stint O(2b)
Do 1 pass O(b)

Solution 2 will be done (but solution 3 can be kept in mind if 2 passes are a bottleneck, which is very unlikely compared to bigint operations)

Future

The refactoring is incidentally removing all casts which should make JS target easy: Allow JS compilation #16
Also Clang/LLVM ExtInt (wide-integer) will be very interesting for optimized WASM code http://blog.llvm.org/2020/04/the-new-clang-extint-feature-provides.html
Non-power of 2 sizes should be easy to support
As we compile with stackTraces / lineTraces, we might want to deactivate them within Stint as any checks is likely to pollute the carry flag and significantly worsen the codegen in particular in the EVM.

mratsim · 2021-03-24T16:03:05Z

Relevant criticism of the division implementation in the wild: https://skanthak.homepage.t-online.de/division.html

…M ... during compilation)

…l files [skip-ci]

…ntestable in CI, a pain to maintain and an intermediate serialization step instead of casting is cheap

mratsim mentioned this pull request Jun 18, 2020

Add bigints to standard library nim-lang/Nim#14696

Closed

mratsim and others added 22 commits June 12, 2023 17:08

Implement types and uint128 primitives

de87739

Implement comparison

36cc2b2

reimplement bitwise

cbbffe4

Implement multiprecision addition / substraction

206ffa9

Implement multiplication

a0dec54

Passing addition tests (however simple bitwise ops crash the int128 V…

7f6c588

…M ... during compilation)

Fix compiletime primitives to pass all bitwise tests except large shifts

2ac1ee3

Implement toHex/fromHex and fix shl

777a84e

passing compile-time bitwise tests (but not runtime :?)

195480d

Fix noInit issue at runtime, pass the bitwise tests

3df7f38

Fix comparison operators

59bca47

Pass extended precision bitops2 tests

254d4da

Implement exponentiation, test mul, split mul/div tests

dc9e0a4

For division we need internal add/sub/shift/bitwise so create interna…

dd3ab71

…l files [skip-ci]

dump progress

f952314

stash div refactor

c2ed8a4

uint division - compile and pass the single limb tests

53d2fd1

Division/modulo implemented - pass property-based testing vs ttmath

7efa248

Use littleEndian for limb-endianness: bigEndian arch are very rare, u…

4660dfe

…ntestable in CI, a pain to maintain and an intermediate serialization step instead of casting is cheap

Add randomized testing, harden against edge cases

27e9c9e

Add randomized testing vs GMP

bff69b3

rebase and try to make it works with clients

63a3212

jangko force-pushed the complete-refactor branch from 59dd896 to 63a3212 Compare June 12, 2023 13:26

jangko changed the title ~~[WIP] Refactor to use an array backend~~ Refactor to use an array backend Part 1 Jun 12, 2023

jangko added 4 commits June 12, 2023 21:07

let the tests compileable and run

0dc6afe

fix stuint constructor

e5c352f

disable dot borrow temporary

7ce5364

nimvm workaround for primitives

8c5a964

jangko merged commit 8c5a964 into master Jun 21, 2023

jangko deleted the complete-refactor branch June 21, 2023 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to use an array backend Part 1 #104

Refactor to use an array backend Part 1 #104

mratsim commented Jun 13, 2020 •

edited

Loading

mratsim commented Mar 24, 2021

	var t, u, v = Word(0)
	var z: typeof(r) # zero-init, ensure on stack and removes in-place problems

	staticFor i, 0, min(a.len+b.len, r.len):
	const ib = min(b.len-1, i)
	const ia = i - ib
	staticFor j, 0, min(a.len - ia, ib+1):
	mulAcc(t, u, v, a[ia+j], b[ib-j])

	z[i] = v
	v = u
	u = t
	t = 0

Refactor to use an array backend Part 1 #104

Refactor to use an array backend Part 1 #104

Conversation

mratsim commented Jun 13, 2020 • edited Loading

Status

Overview

Addition / Substraction

Multiplication

Modular arithmetic

IO

Future

mratsim commented Mar 24, 2021

mratsim commented Jun 13, 2020 •

edited

Loading